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1.  Introduction.  The  two-populetlon  claseiflcation  problem  1*  to 
Identify  « population  with  one  of  two  given  populations  and 

based  on  observations  from  these  populations  on  a random  vector  X . We 
shall  consider  here  X to  be  univariate.  Let  be  the  c.d.f.  of  X 

in  TT^  (l  > 0,  1,  2)  . Thus  our  problem  is  to  test  ~ 

Fq  m . In  this  paper  we  have  considered  some  rules  which  are  sug> 
gested  in  the  literature  when  known  except  that  they  are 

continuous.  We  have  studied  the  performances  of  the  following  three  rules 
by  simulation. 

Let  Xq»  X^^  (1  ■ l,...,n^),  X^^  (i  » l,...,n2)  be  random  observations 

on  X from  the  populations  TrQrTTj,  TTg,  respectively. 

Rule  1.  1-NH  (nearest  neighbor)  Rule;  Measure  distances  of  Xq  from 
Xj^^'s  and  ^21**  based  on  these  distances  classify  X^  into  the 

population  to  which  its  nearest  neighbor  belongs. 

Rule  II.  1-RWH  (rank  nearest  neighbor)  Rule;  Pool  all  the  observations 
and  order  them. 

(a)  If  Xq  is  the  largest  or  the  smallest  observation  classify  Xq 
into  the  population  of  its  nearest  neighbor  (based  on  ranks). 

(b)  If  both  the  right-hand  and  the  left-hand  nearest  neighbor  of  Xq 
(denoted  by  and  Vj^)  belong  to  the  same  population,  classify  Xq  into 
that  population. 

(c)  If  and  belong  to  different  populations  classify  Xq 
into  and  with  probabilities  I/2  end  1/2.  respectively.  (We  call 
this  case  a “tie".) 

Rule  III.  2-RIIM  Rule!  Apply  the  l-RMR  rule.  If  a tie  occurs*  delete 
the  observations  oorresponding  to  ood  and  apply  the  14RII  rule 
agcin  on  the  resHiining  obeervationa . 
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The  first  rule  was  suggested  and  studied  by  Fix  and  Hodges  19^3)< 

DasGupta  and  Lin  (197?)  proposed  the  RHN  rules  and  obtained  the  asymptotic 
probabilities  of  mlsclasslf Icatlon  as  n^^,  n^  -»  ® , For  a given  rule  6, 
let  its  PMC  under  Fq  <=  Fj^  be  given  by 

cr(8)  “ Pr[6  classifies  Xq  into  tt^  |Pq  = . 


Let  aj^,  Ot*,  et*  be  the  asymptotic  values  of  a corresponding  to  the 
above  rules  1,  2 and  3.  Let  f^  be  the  p.d.f.  of  F^^  with  respect  to 
Lebesgue  measure  (i  = lt2)  and  p^  ^ lim  n^/(n^  + n^)  (1  = 1,2)  as 
min  (n^.n^)  -»«*>.  It  was  shown  by  Fix  and  Hodges  (l95l)  ®nd  DasGupta  and 
Lin  (1977)  that 


0^  = 0^  = 
0^  » 0^  + 


mdBO 

r ^ — fi(x)dx . 


In  this  paper  we  have  studied  the  finite-sample  perfomiances  of  these  rules 
by  estimating  a based  on  samples  from  sets  of  two  given  populations. 


2.  The  Experiment.  Different  steps  of  our  sioulation  study  are  given 
below. 

(i)  Two  known  but  different  univariate  distributions  F^  and  F^ 
are  chosen. 

(ii)  Random  sao^les  of  sisss  n^  and  n^  from  F^  and  F^  , 
respectively,  are  obtained;  these  sasples  are  called  training  samples. 

(ill)  A random  sao^le  of  siie  n^  from  F^  ■ Fj^  is  obtained.  We 
call  this  a tsst  sas^la. 

(iv)  For  each  observation  in  thm  test  sample  a given  classification 
nils  8 (one  of  the  above  three  rules)  is  applied  and  let  n^  be  the 
number  of  the  observations  in  ths  tsst  sample  which  are  classified  by  6 


into  Fg  . Let  oKc)  ■ ''q2^*'o  proportion  of  test  samples  misclas- 

sified  into  . 

(v)  Steps  (ii)-(iv)  are  repeated  r times  for  new  training  and  test 
samples  keeping  n^,  n^  and  n^  fixed. 

(vi)  The  mean  and  the  standard  error  of  the  mean  based  on  r values  of 

A 

oKo)  thus  obtained  are  recorded. 

(vii)  Steps  (ii)-(vi)  are  repeated  for  different  values  of  n^^,  n^ 
and  r . 

(viii)  Fg  is  characterised  by  a parameter  0 . For  different  values  of 
0 steps  (i)-(vil)  are  repeated. 

Our  choices  are  given  in  the  following  table. 


"2 

Parameters 

"l-”2 

"o 

r 

N(0,1) 

N(e,i) 

0.0,  +1,  3 

25 

100 

20 

100 

400 

4 

N(0,1) 

N(O,0) 

0..2,  3.  1/2,  1/3 

25 

100 

20 

100 

400 

4 

e-* 

0e“®* 

0"1,  2,  3. 

100 

100 

20 

(density) 

1/2,  1/3,  1/4,  1/8 

Cauchy  (0,l) 

Cauchy  (0,l) 

0-0,  +1,  +2,  +3 

25 

100 

20 

100 

400 

4 

Samples  are  ganaratad  by  a library  subroutine  available  on  the  CDC 
6400  at  the  University  of  Minnesota. 

Hots  1.  In  the  following  tables  "Half”  refers  to  taking  one-half  the 
nu^er  of  ties  to  count  as  sdselaselfled  and  '*R-half"  refers  to  resolving 
the  ties  by  the  use  of  uniform  random  nusd>er  generator. 


I 
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Note  2.  In  some  of  the  following  tab4e8  EFMC  denotes  an  estimate  of 
the  asymptotic  PMC  ((y»  ■ a^)  of  the  1-NN  and  1-RNN  rules.  These  are  « ! 

derived  by  the  method  of  runs  as  suggested  In  Das  Gupta  and  Lin  (1977).  ' ] 

3.  Tables  j 

Table  3.1  ■ 

i 

t 1 

Proportion  of  test  sample  mlsclasslf led  Into  ^2  • 

- N(0,1),  Fg  - N(e,l);  "i  “ "2  “ Oq  - 100,  r - 20  . 

Optimal  (assuming  g Is  known  and  for  mlnlmax  rule)  PMC  Is 
♦(-|e|/2)  . 


\Rule 

INN 

RNN 

2- RNN 

Opt.  Exp't. 

MBAN 

MEAN 

MEAN 

8ae. 

PMC 

0 

■ 

<9 

.i*79 

.017 

Half 

Bhalf 

.479 

.485 

.013 

.016 

Half  .479 
Rhalf  .484 

.014 

-015 

.500 

.374 

.018 

Half 

Rhalf 

.381 

.374 

.014 

.016 

Half  .343 
Rhalf  .340 

.021 

.021 

.308 

(D 

1 

1 

.426 

.020 

Half 

Rhalf 

.426 

.432 

.014 

.017 

Half  .421 
Rhalf  .425 

.025 

.024 

.308 

0-2 

.195 

.018 

Half 

Rhalf 

.194 

.196 

.018 

.018 

Half  .165 
Rhalf  .164 

.017 

.018 

.159 

e - -2 

.245 

.020 

Half 

Rhalf 

.254 
• 251 

.018 

.018 

Half  .258 
Rhalf  .255 

.019 

.018 

.159 

0 ■ 3 

.086 

.012 

Half 

Rhalf 

.089 

.084 

.012 

.011 

Half  .062 
Rhalf  .061 

.010 

.009 

.067 

1 

1 

.105 

.013 

Half 

Rhalf 

.114 

.113 

.012 

.011 

Half  .119 
Rhalf  .118 

.015 

.015 

.067 

- 5 - 
T^ble  1.2 


Proportion  of  test  sample  miscLassified  into  rtr, 


N(0,1),  Fg  - N(e,l);  n 


0-1 
0 - .] 

0-2 

9 - -2 

0-3 

0 - -3 


INN 

RNN 

MEAN 

s.d. 

MEAN 

s.d. 

.490 

.018 

Half 

.482 

.008 

Rhalf 

.475 

.006 

.415 

.010 

Half 

.398 

.014 

Rhalf 

.404 

.024 

.402 

.010 

Half 

.39^ 

.007 

Rhalf 

.397 

.007 

.208 

.010 

Half 

.210 

.010 

Rhalf 

.208 

.009 

.209 

.012 

Half 

.213 

.008 

Rhalf 

.215 

.009 

.088 

.011 

Half 

.083 

.009 

Rhalf 

.082 

.007 

.104 

.012 

Half 

.101 

.008 

Rhalf 

.107 

.013 

2-RNN 


Opt.  Exp't. 


.i4d  Half 
Rhalf 

.36  Half 
Rhalf 

.38  Half 
Rhalf 

.22  Half 
Rhalf 

.22  Half 
Rhalf 

.10  Half 
Rhalf 

.09  Half 
Rhalf 


MEAN  s.d.  PMC 


.500 


Proportion  of  test  sample  mlsclasslf led  Into  n 


^ - N(0,1),  Fg  - N(0,e);  n^ 


100.  r 


eV"** 

INN 

RNN 

2-RNN 

MEAN 

s.e. 

MEAN 

s.e. 

MEAN 

8 • 6 e 

0 - 2.0 

.375 

.009 

' Half  .39^ 

.008 

Half 

.353 

.014 

Rhalf  .393 

.010 

Rhalf 

.355 

.015 

S - 3.0 

.399 

.014 

Half  .346 

.013 

Half 

.293 

.019 

Rhalf  .337 

.013 

Rhalf 

.295 

.018 

0 - .5 

.417 

.017 

Half  .438 

.015 

Half 

.461 

.020 

Rhalf  .337 

.018 

Rhalf 

.460 

.021 

0 - 1/3 

.359 

.022 

Half  .376 

.018 

Half 

.393 

.019 

Rhalf  .380 

.019 

Rhalf 

.391 

.019 

Proportion  of  test  sample  mlsclasslfled  Into  tt2  • 
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Table  ^.5 


Proportion  of  test  sample  mlsclasslf led  into  n2  • 


■*  r f \ "9^ 

e , fo(x)  - ee  ; 


"l  " "2  " ”0 


100,  r - I 


N^RuI. 

INN 

RNN 

EFMC 

2- RNN 

MEAN 

s.e. 

MEAN 

s.e. 

MEAN 

s.e. 

9 - 1 

.508 

.016 

Half  .509 

.013 

.47 

Half 

.503 

.013 

Rhalf  .523 

.016 

Rhalf 

.517 

.015 

9-2 

.442 

.015 

Half  .434 

.014 

.38 

Half 

.442 

.016 

Rhalf  .438 

.017 

Rhalf 

.444 

.016 

9-3 

.402 

.014 

Half  .388 

.011 

.36 

Half 

.39^ 

.013 

Rhalf  .387 

.011 

Rhalf 

.387 

.014 

e - 

.335 

.009 

Half  .330 

.007 

.32 

Half 

.327 

.009 

Rhalf  .336 

.008 

Rhalf 

.330 

.009 

9 - .5 

.453 

.010 

Half  .453 

.009 

.38 

Half 

.430 

.010 

1 

Rhalf  .4^ 

.013 

Rhalf 

.430 

.014 

9 - 1/3 

.410 

.011 

Half  .395 

.008 

.36 

Half 

.346 

.010 

Rhalf  .386 

.009 

Rhalf 

.335 

.010 

0 - Uk 

.354 

.015 

Half  .364 

.012 

.32 

Half 

.290 

.013 

Rhalf  .372 

.013 

Rhalf 

.292 

.013 

e - 1/8 

CM 

e 

.014 

Half  .248 

.012 

.22 

Half 

.181 

.011 

Rhalf  .259 

.014 

Rhalf 

.185 

.010 
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Table  3.6 

Proportion  of  test  sample  mlsclasslfied  into  • 

■ Cauchy(0,l),  ■ Cauchy(e,l);  n^^  ■ n^  “ 25.  - 100,  r - 20  . 


\Rule 

INN 

RNN 

2- RNN 

MEAN 

s.e. 

MEAN 

s.e. 

MEAN 

s.e. 

.473 

.018 

Half 

.430 

.015 

Half 

.488 

.027 

■1 

Rhalf 

.493 

.018 

Rhalf 

.505 

.029 

.406 

.022 

Half 

.418 

.022 

Half 

.397 

.031 

nn 

Rhalf 

.408 

.025 

Rhalf 

.395 

.033 

.398 

.016 

Half 

.410 

.012 

Half 

.369 

.021 

Rhalf 

.410 

.013 

Rhalf 

.385 

.022 

CVJ 

1 

(D 

.288 

.021 

Half 

.297 

.021 

Half 

248 

.027 

Rhalf 

.288 

.021 

Rhalf 

.238 

.028 

.247 

.012 

Half 

.264 

.012 

Half 

.248 

.017 

B 

Rhalf 

. .276 

.015 

Rhalf 

.252 

.019 

.161 

.020 

Half 

.168 

.017 

Half 

.103 

.017 

B 

Rhalf 

.161 

.018 

Rhalf 

.099 

.017 

.153 

.015 

Half 

.156 

.013 

Half 

.130 

.014 

B 

Rhalf 

.154 

.013 

Rhalf 

.125 

.014 

( 
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Tab_l_e  J.7 


Proportion  of  test  sample  misclasslf led  into 


.Rule 


9-0 

e * 1 
0 - -1 
e - 2 
0 - -2 

0 - 3 


C:auchy(0,l), 

- Cauchy (e,l) ; 

"l‘  "2  " 

100,  n^ 

- 400, 

r - 4 . 

INN 

RNN 

2- RNN 

MEAN 

s.e. 

MEAN 

s.e. 

MEAN 

s.e. 

.494 

.015 

Half 

.514 

.013 

K?lf 

.506 

.017 

Rhalf 

.529 

.014 

Rhalf 

.512 

.021 

.411 

.010 

Half 

.426 

.009 

Half 

.381 

.018 

Rhalf 

.446 

.018 

Rhalf 

.390 

.017 

.457 

.029 

Half 

.446 

.033 

Half 

.394 

.028 

Rhalf 

.454 

.025 

Rhalf 

-.393 

.025 

.284 

.007 

Half 

.278 

.008 

Half 

.217 

.033 

Rhalf 

.283 

.009 

Rhalf 

.219 

.024 

.152 

.016 

Half 

.318 

.022 

Half 

.254 

.014 

Rhalf 

.321 

.015 

Rhalf 

.257 

.010 

.152 

.016 

Half 

.154 

.015 

Half 

.088 

.018 

Rhalf 

.417 

.012 

Rhalf 

.087 

.014 

0 - -3 


204  .034 


Half  .199  .032 
Rhalf  .198  .034 


Half  .105 
Rhalf  .103 


011 

012 
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U.  Concluding  Remarks.  For  all  the  three  rules  considered,  it  seems 

A 

that  01^  has  a definite  tendency  to  decrease  as  6 moves  away  (in  either 
direction)  from  its  value  under  . 

For  small  n^  = n^  there  is  not  any  marked  difference  In  performances 
of  these  three  rules  although  the  2-RNN  rule  may  be  a bit  better.  However, 
for  large  n^  = n^  the  2-’RNN  rule  seem  to  have  markedly  better  performance 
except  for  the  cases  N(0,1)  vs.  N(0,6),  6 < 1.  This  report  Is  the  first 
empirical  study  on  the  performances  of  INN  and  RNN  rules,  although  a more 
detailed  study  especially  on  multi-stage  RNN  rules  Is  called  for. 
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